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METHOD AND SYSTEM FOR DATA CLASSIFICATION 
IN THE PRESENCE OF A TEMPORAL NON-STATIONAMTY 

Field of the Invention 

The present invention relates to a method and system for classifying data, and 
5 more particularly to a data classification method and system in the presence of a 
temporal non-stationarity. 

Backgronnd Information 

Approaches for predicting die value of a dependent response variable based 
the values of a set of independent predictor variables have been developed by 
10 practitioners in the art of the statistical analysis and data mining for a number of 
years. Also, a number of conventional approaches for modeling data have been 
developed. These known techniques require a set of restrictive assumptions about the 
data being modeled. These assumptions include, e.g., a lack of noise, statistical 
indepen dence, time invariance, etc. Therefore , if the real d ata being modeled is 
dependant on certain factors which are contrary to the assumptions required for the 
accurate modeling by the conventional techniques, the results of the above-described 
conventional data modeling would not be accurate. 

This is especially the case in the presence of temporal, non-stationary data. 
Indeed, no robust approach which considers such data has been widely used or 
accepted by those in the art of the statistical analysis. For a better understanding of 
the difficulties with the prior art approaches, temporal data and non-stationary data 
are described below. 

Temporal data refers to data in which there exists a temporal relationship 
among data records which varies over time. This temporal relationship is relevant to 
the prediction of a dependent response variable. For example, the temporal data can 
be used to predict the future value of the equity prices, which would be based on the 
current and past values of a set of particular financial indicators. Indeed, if one 
believes in the importance of trends in the market, it is not enough to simply consider 
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the current levels of these financial indicators, but also their relationships to the past 
levels. 

In another example, the supermarket application may prefer to group certain 
items together based on the purchasers' buying patterns. In such scenari os, the 
5 temporal data currently used in such supermarket application is the data provided for 
each customer at the particular checkout, i.e., a single event However, using the data 
at the checkout counter for a single customer does not take into consideration the past 
data for this customer (i.e., his or ha- previous purchases at the counter). In an 
example of an intrusion detection system, die use of the time- varying data is very 

10 important. For example, if a current login fails because the password was entered 
incorrectly, this system would not raise any flags to indicate that an unauthorized 
access into the system is being attempted. However, if the system continuously 
monitors the previous login attempts for each user, it can determine whether a 
predetermined number of failed logins occurred for the user, or if a particular 

15 sequence of events occurred. This event may signify that an unauthorized access to 
the system is being attempted. 

Non-stationary data refers to data in which the functional relationship between 
the predictor and response variables changes when moving from in-sample training 
data to out-of-sample test data either because of inherent changes in this relationship 

20 over time, or because of some external impact. For example, with a conventional 
network intrusion detection system, a predictive model of malicious network activity 
can be constructed based on, e.g., TCP/IP log files created on a particular network, 
such as the pattern formed from the previous intrusion attempts. However, intruders 
become more sophisticated in their attack scenarios, attack signatures will evolve. In 

25 addition, the conventional intrusion detection systems may not be usable for all 
conceivable current operating systems, much less for any future operating systems. 
An effective intrusion detection system must be able to take into consideration with 
these changes. 

One of the main difficulties being faced by the conventional predicting 
30 engines is that the data is •'multi-dimensional" which may lead to "over-fitting". 

While it is possible to train the prediction system to make the predictions based on the 
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" previous data, it would be difficult for this system to make a prediction based on both 
new data and the data which was previously utilized to train the system. The 
conventional systems utilize predictor values for each category of the data so as to 
train themselves as described above. For example, if the prediction system intends to 
5 predict the performance of certain baseball teams, it would not only use the batting 
average of each player of the respective team, but also other variables such as hitting 
powers of the respective players, statistics of the team while playing at home, 
statistics of the team when it is playing away from home, injury statistics, age of the 
players, etc. Each of these variables has a prediction variable associated therewith. 

10 Using these prediction variables, it may be possible to train the system to predict the 
performance of a given baseball team. 

However, the conventional systems and methods described above are not 
flexible enough to perform its predictions based on a new variable (e.g., the number 
of player leaving the team) and a new corresponding prediction variable being utilized 

15 for the analysis. In addition, it is highly unlikely that the data values being utilized by 

the conventional s ystems and methods , i.e.. after the system has already been trained, 

is die same as or similar to die data of the respective prediction variables that were 
already stored during the training of this system. The above-described example 
illustrates what is known to those having ordinary skill to the art as "over-fitting". As 

20 an example to illustrate this concept, die system may only be trained using t raining 
data (e.g., in-sample data) which can represent only 0.1% of the entire data that this 
system may be required to evaluate. Thereafter, the prediction model is built using 
this training data. However, when the system is subjected to the real or test data (e.g., 
out of sample data), there may be no correlation between the training data and the real 

25 or test data. This is because the system was only subjected to training us 

portion of the real/test data (e.g., 0.1%), and thus never seen most of the real or test 
data before. 

There is a need to overcome the above-described deficiencies of the prior ait 
systems, method and techniques. In particular, there is a need to provide a method and 
30 system for classifying data that is temporal and non-stationary. 
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Summary f the Invention 

A classification system and method according to the present invention offers 
an approach for a prediction in the presence of temporal, non-stationary data which is 
advantageous over the conventional systems and methods. The first exemplary step 
5 of the system and method uses temporal logic for discovering features provided in the 
data records. The next exemplary step is the classification of the data records using 

* the selected features. Another exemplary step of the system and method of the 
present invention utilizes a "shrinkage technique" to reduce the undesirable effect of 
"over-fitting w . 

10 Accordingly, a method and system according to the present invention are 

provided for determining a feature of a particular pattern. Using these exemplary 
system and method, data records are received, and predetermined patterns that are 
associated with at least some of the data records are obtained. Using the system and 
method, particular information is extracted from at least a subset of the received data 

IS records, the particular information being indicative of the particular pattern for at least 
some of the data records. Then, it is determined whether the particular pattern is an 
unexpected pattern based on the obtained predetermined patterns. At least one record 
of the data records may include temporal data and/or non-stationary data. 

In another embodiment of the present invention, die predetermined patterns 

20 are obtained by assigning a threshold, and correlating the data records into sets of 
patterns as a function of the threshold. Also, the determination of whether the 
particular pattern in an unexpected pattern include a determination if the particular 
pattern corresponds to at least one pattern of the sets of patterns. The positive 
determination regarding the unexpected pattern can be made if die particular pattern 

25 does not correspond to any pattern of the sets of patterns. 

In yet another embodiment of the present invention, the unexpected pattern 
can be indicative of an interestingness measure in the predetermined pattern. In 

• addition, the data records can include input sequences, and die input sequences can be 
scanned to determine an interestingness measure of at least one event in the input 

30 sequences. It is also possible to initialize a pattern list by inserting all events of the 
input sequences therein. Thai, from all patterns in the pattern list, a first pattern which 
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has a largest interestingness measure may be selected The data records may include 
a maximum allowable length value. Thus, the first pattern can be expanded to be a 
second pattern. If a length of the second pattern is greater than the maximum 
allowable value, the second pattern can be added to the pattern list. Thereafter, if a 
5 length of the second patten is less than or equal to the maximum allowable value, the 
first pattern can be subtracted from the pattern list These steps can be repeated until 
the pattern list becomes empty. Finally the particular pattern which includes the 
interestingness measure can be output 

According to still another embodiment of the present invention, a pattern list 
10 may be initialized by inserting all events of the input sequences therein, and at least 
one suffix list can also be initialized Locations of certain patterns of the input 
sequences can be calculated, and previously discovered may be updated pattens 
based on the calculated locations. The pattern list of the certain pattens can then be 
updated. The data records can include a maximum allowable length value. 
15 In another embodiment of the present invention, further records are generated 

by modifying the data records to include additional features. Also, a functional model 
is generated using the further records. A plurality of sets of the further records are also 
generated, and the prediction model is generated for each set of the further records. 
Furthermore, a single model can be generated based on each functional model of the 
20 respective set of the further records. 

According to yet another embodiment of the present invention, the data 

records which-have the unexpected pattern can be classified Thereafter, a prediction 

model is generated as a function of the classified data records. The classification of 
the data records can be performed using a Multivariate Adaptive Regression Splines 
25 technique. Then, data and/or parameters of at least one of the classified data records is 
shrunk so as to determine a mean of the data and/or the parameters . The shrinking 
technique can be a Stein's Estimator Rule technique. 

Brief Description of the Drawings 

For a more complete understanding of the present invention and its 
30 advantages, reference is now made to the following description, taken in conjunction 
with the accompanying drawings, in which: 
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Figure 1 is an exemplary embodiment of a classification system according the 
present invention; 

Figure 2 is a top level diagram of an exemplary embodiment of a method 
according to the present invention, which can be performed by the classification 
5 system of Figure I; 

Figure 3 is a flow diagram of a first exemplary feature selection technique of 
the method according to the present invention which performs the feature selection by 
u tilizing a threshold to determine whether a particular pattern is an unexpected 
pattern; 

10 Figure 4A is a flow diagram of a second exemplary feature selection technique 

of the method according to the present invention which performs the feature selection 
based on an interestiogness measure; 

Figure 4B is a flow diagram of a third exemplary feature selection technique 
of the method according to the present invention which performs the feature selection 
15 based on suffix lists; 

Fig ure 5 is an i llustration of an exempla ry im plemen tation of the system and 

method of the present invention by an intrusion detection system; 

Figure 6 is a flow diagram of the exemplary embodiment of the method of the 
present invention utilized by the intrusion detection system of Figure 5, in which a 
20 prediction model is generated; 

Figure 7 is another flow diagram of the exemplary implementation of the 
metbodof the presentinvaition Bylhe infiuaondetectibn system of Figure S; 

Figure 8 an illustration of an exemplary implementation of the system and 
method of the present invention by a disease classification system; and 
25 Figure 9 is a flow diagram of the exemplary implementation of the method of 

the present invention by the disease classification system of Figure 8. 

Detailed Description 

Figure 1 illustrates an exemplary embodiment of a classification system 10 
according to the present invention. In this drawing, the system 10 is connected to one 
30 or more databases 20 for receiving an ordered set of data records. Each data record 
preferably includes a set of features that may be relevant (given particular domain 
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knowledge) for predicting the value of a defined dependant variable. In addition, a 
particular data record may also include certain relationships between itself and other 
data records. 

Upon the receipt of these data records, die system 10 according to the present 
5 invention selects and/or extract certain features from the data records, as shown in 
step 100 of Figure 2, which illustrates an exemplary embodiment of the method 
according to the present invention. These features may be temporal features that are 
most relevant for predicting the value of the dependent variable. Then, in step 1 10, 
the system 1 0 uses the method of the present invention to classify and modify the data 

10 records received from the databases 20 based on the features that were extracted from 
the data records and the classification thereof. Since the classified data records being 
generated by step 1 10 are numerous, it is beneficial to shrink them, (step 120 of 
Figure 2). Thereafter, the data records that were selected as including or being part of 
the particular patterns (when classified and shrunk) are used to generate a predictive 

15 model in step 130 of Figure 2. Finally, die prediction model and/or the shrank data 
records and patterns are output For example, Figure 1 illustrates that such output can 
be provided to a printer 30 for generating hard copies of the predicted model or 
shrunk data, forwarded to a display device 40, stored on a storage device 50, and/or 
transmitted via ,a .communications network 60 to another device {not shown in 

20 Figure 1). 

According to one exemplary embodiment of the present invention, the system 
10 can be a general purpose computer (e.g., Intel processor-based computer), a special 
purpose computer, a plurality of each and/or their combination. The storage device 
50 can be one or more databases, one or more hard drives (e.g., stackable hard drives) 
25 internal RAM, etc. The communications network 60 can be the Internet, Intranet, 
extranet, or another internal or external network. It is also within the scope of the 
present invention to receive the data records from the databases 20 via a 
communications network, such as the Internet, Intranet, etc. The details of exemplary 
embodiments of the present invention are provided below. 



30 
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I. FEATURE SELECTION 

To accomplish the extraction/selection of the features from the data records, 
the classification system 10 searches and preferably selects certain patterns in the data 
records which can be defined as having an "interestingness measure". This particular 
5 interestingness measure used is preferably domain dependent, and in general, it is the 
measure of how much the occurrence of the pattern correlates with the occurrence of a 
single value of the predicted variable* The determination of the interestingness 
measure can be useful in a number of examples, such as, e.g., for a network intrusion 
detection. When searching for patterns that characterize malicious activity on the 

10 network, not only the patterns that occur frequently in the presence of an attack are 
monitored, but also the selection of those patterns which occur more frequently during 
an attack than during the normal network activity. 

The above-described example defines at least one "interestingness feature" 
which can be used by the system and method of the present invention for monitoring 

15 the patterns of die data records having this measure, and selecting the corresponding 

patterns therefrom.-For-exaniple r the interestingness measure for the networic 

intrusion system may be a ratio of a number of occurrences of the particular pattern 
during the course of intrusion to the number of occurrences of this pattern during the 
course of normal network behavior. This interestingness measure, unlike the 

20 frequency, enables an identification of patterns that are non-frequent and yet highly 
correlated with intrusive behavior, and provides a way to ignore patterns which occur 
frequently during an intrusion, but occur just as frequently during normal behavior. 

In another example of the network intrusion detection, the dependent variable 
that may be used for the interestingness feature can have a value between 0 and 1, 

25 which represents the probability that the associated data record that can be a part of 
the intrusion. In this exemplary case, the interestingness measure of a pattern P is 
denoted as: 

I(P) « Pr(Intnision | P). 
The interestingness measure of the pattern P would, in this case, be the 
30 probability that the particular data record is part of die intrusion given that the pattern 
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„P occurred. Using a predefined interestingness threshold T, the following sets of 
patterns can be included in the data records as additional features: 

SI - {P| I(P) >T}, S2= {P| I(P) < 1-T}, S3 - {P| -P e S2} 
For example, set S 1 may represent the most interesting patterns. In the case of 
5 the intrusion detection, set SI may be defined as a set of patterns that are most highly 
correlated with the intrusion based on the training of the prediction model using in- 
sample data. Set S2 may include the least interesting patterns, or in the exemplary 
intrusion detection, set S2 may represent the most highly correlated patterns with a 
normal behavior also based on the training of the prediction model using in-sample 
10 data. Set S3 may have the patterns whose negation is provided in set 32. The purpose 
of set S3 is to aid in the mitigation of the effects of non-stationarity. 

For example, in the intrusion detection scenario, the system 10 and method 
according to the present invention take into consideration the situation in which the 
out-of-sample data set contains an intrusion that was not present in the in-sample data 
15 on which the model was based. Thus, as illustrated in Figure 3, an exemplary 
embodiment -of the present invention provides that the system ! 0 receives an ordered 



set of data records which includes the data records used for accessing the network 
(step 200), and assigns a predetermined interestingness threshold T to be applied to 
these data records (step 210). The data records are then correlated so that particular 

20 sets of patterns are associated therewith, based on the threshold T<step 220). In step 
230, it is then determined whether the current pattern (e.g., a predetermined number 
of unsuccessful logins to the network) corresponds to the first type of an expected 
event that is provided in set SI. It would not be expected that the patterns that are 
part of this novfcl attack to be in set SI, since set SI contains the patterns associated 

25 with only those attacks present in the training data (e.g., which used the in-sample 
data for generating the prediction model). If the current pattern corresponds to the 
patterns in set SI, then the pattern is assigned as being of the first type in step 240, 
i.e., definitely an intrusion attack on the network. Otherwise, it is determined tin 
step 250) whether the current pattern corresponds to the second type of an expected 

30 event that is provided in set S2. 
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If the current pattern corresponds to the patterns in set S2, then the pattern is 
assigned as being of the second type in step 260, i.e., definitely not an intrusion attack 
on the network. It would not be expected that the patterns that are part of this novel 
attack to be in set S2 because this set S2 contains die patterns that are associated with 
5 a normal behavior of the network (as trained by the in-sample data). However, if the 
current pattern does not correspond to set SI or set S2, then there is a pattern that does 
not neatly fit into any known set of patterns, i.e., thus being a novel attack. This 
pattern would not be considered as being a normal behavior on the network. 
According to this exemplary embodiment of the system and method according to the 

10 present invention, the pattern(s) present in the above described novel attack are 

considered as deviating from the patterns provided in set S2. Therefore, the current 
pattern has to be the third type of event , i.e., an unexpected (or interesting) event, 
which should be part of the set S3 of patterns that were in neither set SI nor in set S2. 
Thus, in step 270, the current pattern is set as including an interestingness feature so 

15 as to identify its behavior as deviating from what is considered as the normal behavior 

on-thenetwork,_evenif-this-^^ After 

the cun-ent pattern is set as described above with reference to steps 240, 260, 270, the 
determination regarding the type of the event (of the current pattern) is output in 
step 280. 

20 Given that the data records are populated with both a set of basic features as 

well as the derived features, namely temporal patterns, a classifier based on this data 
can be generated. 

From the above described exemplary method of the present invention, it 
should be understood that an interestingness measure for the 

25 records could be defined as marking such patterns "unexpected" patterns. To find 
unexpected patterns, it may be preferable to first define these patterns in terms of 
temporal logic expressions, in sequences of die data records. For example, it is 
possible to assume that each event in each data record in the sequence occurs with 
some probability, and that certain conditional distributions on the neighboring events 

30 are present Based on such predicates, it is possible to compute an expected number 
of occurrences of a certain pattern in a sequence. If the actual number of die 
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occurrences of a particular pattern significantly differs from the expected number of 
the occurrences, then this particular pattern would be considered "unexpected" and 
therefore interesting. 

To determine the expected number of the occurrences of the particular pattern 

5 P, it may preferable to assign a probability distribution over the events according to 
one exemplary embodiment of the present invention. In general, certain problem 
domains may suggest a preferable technique to evaluate these expectations rather than 
by calculating them as a function of the frequencies of individual events. In the 
exemplary network intrusion detection setting, it is possible to calculate the expected 

1 0 number of the occurrences of the particular pattern P during the attack on the network 
based on the frequency of the particular pattern P during the normal activity on the 
network. In other settings, i.e., different than the network intrusion detection, other 
techniques for determining the expectations may be appropriate. The underlying issue 
solved by the system and method of the present invention is whether given any 

1 s technique for computing the expectations for the particular pattern, it is possible to 

efficiently identify interesting or-imexpected patterns using the retrieved data 

In one exemplary technique of the method according to the present invention, 
all unexpected patterns can be found i£ e.g., the ratio of die actual number of 
occurrences to the expected number of occurrences exceeds a certain threshold. This 

20 exemplary technique is illustrated in Figure 4A. First, input string(s)/sequence(s) 30S, 
event probabilities 306, a threshold T for the interestingness measure 307 and a 
number for a maximum allowable pattern length ("MAXL") 308 are provided to the 
system 10. The event probabilities 306 may be determined for each atomic event. 
The threshold T 307 may be a value that, if exceeded by the interestingness measure 

25 of a pattern, deems the pattern to be interesting. It is also possible to input a user- 
defined constant to the system 10 which determines die maximum number of events 
that a particular event or data record can precede another event or data record. Then, 
in step 310, the input string(s)/sequence(8) are scanned to determine the 
interestingness measure of each event therein. In step 315, a list L that includes all 

30 these events is initialized. From all patterns provided in die list L, a particular pattern 
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C is selected which has the largest interestingness measure to be the next pattern for 
expansion (step 320). 

Then, in step 325, this particular pattern C is indeed expanded by scanning the 
input string(s)/sequence(s) to detect the occurrences of the particular pattern C. When 

5 the occurrence of the pattern C is detected, the particular pattern C is expanded as a 
prefix and as a suffix, i.e., record all occurrences of: <C Op X) and (X Op C) t where 
X is also a pattern, "Op" ranges over the temporal operators, and X ranges over all 
events. Thereafter, the inter estingness or unexpected pattern^) of all newly 
discovered patterns C is determined, i.e., by die system 10 as described below. 

10 In step 330, it is determined whether the length of the newly discovered 

patterns C is smaller than the maximum allowable length (MAXL, and if so, the 
newly discovered patterns C can be removed from the list L (step 340). Otherwise, 
the particular pattern C is removed from the list L in step 335. In step 345, it is 
determined whether the list L is empty. If not, the processing of this exemplary 

1 5 technique of the method according to the present invention is returned to step 320. 

Otherwise, jn step 350, the.mterestmgparte^ eg., to 

the printer 30, the display device 40, the storage device 50 and/or die communications 
network 60. 

In another exemplary embodiment of the present invention, it is possible to 
20 start with small patterns, and expand only those patterns that offer the potential of 

leading to the discovery interesting/unexpected, larger patterns. Using this exemplary 
technique, it is preferable to first find all patterns that occur relatively frequently, 
given a class of operators, an input sequence of events, and a frequency threshold. 
The exemplary technique for solving this problem has two alternating phases: 
25 building new candidate patterns, and counting the number of occurrences of these 
candidates. 

The efficiency of this exemplary technique is based on two observations: 
a. Where there are potentially a large number of patterns that have to be 

evaluated, the search space can be dramatically pruned by building 
30 large patterns from smaller ones in a prescribed way. For example if a 

pattern "aNpNy" is frequent, then the patterns "ocNJT ami "pNy" 
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must also be frequent Thus, for a pattern P to be frequent, its sub- 
patterns should also be frequent. The exemplary technique for 
identifying frequent patterns can take advantage of this fact by 
considering the patterns of size n if its prefix and suffix of size n-1 are 
5 themselves frequent 

b. All complex patterns can be the result of recursively combining other 
smaller patterns. For example, in order to efficiently count the number 
of occurrences of the pattern "ccNflBkCBkY", it is preferable to identify 
the number of occurrences and location of the two patterns "aNp" and 

to "BBfcV", b*") t0 k ave an efficient way for combining the patterns via the 

Bk operator. In general, since all of exemplary operators can be 
binary, when combining two p attems with operator Op to create a 
larger pattern and determine the number of occurrences of the resulting 
pattern, it is preferable to determine the number and locations of Op's 

15 two operands, and to provide an efficient way for locating patterns of 

the form A Op B. 

The exemplary technique according to the present invention initially counts 
the number of occurrences of length 1 patterns (e.g., the length of the pattern is the 
number events that occur in it). Thereafter, a candidate set for the next iteration of 

20 discovery is computed by combining, in a pair-wise mannei all frequent length- i 
patterns via each operator. For example, in the nth iteration, the combination of the 
patterns of length n-1 and length 1 can be added to the candidate set provided that the 
length (n-1) prefix and suffix of die resulting length n pattern have already been 
deemed frequent in the previous iteration. Then, during the discovery phase, the 

25 number and location of the occurrences of the candidate length n patterns can be 

determined given the locations of their length n-1 prefixes and length 1 suffixes. This 
process continues until the candidate set (or list) becomes empty. The memory 
requirements of this exemplary technique are minimized because once a pattern is 
deemed as being infrequent, it can never result in being the sub-pattern of a larger 

30 frequent pattern, and can therefore be discarded. Such property may not hold in view 
of the definition of interestingness provided above, as shall be discussed in further 



WO 01/88834 



PCT/USOI/ISHO 



-14- 



detail below. In particular, a pattern can be unexpected while its component sub- 
patterns may be expected This feature of the interestingoess measure can be 
understood using the following example: 

Let die set of events be E = { A, B, C} . Assume that the probability of these 
5 events is Pr[A]=0.25; PrjB]«=0.25; and Pr(C]=O.50. Also assume (hat these events are 
independent. Let the interestingness threshold T= 2, i.e., for a pattern to be 
interesting, the value of the actual number of occurrences of (he pattern divided by the 
expected number of occurrences of the pattern should preferably exceed 2. For 
example, the following string of events can be input into the system 10: 
10 ABABABABCCCTCCCCCCCC (the length of (his string being N = 20) 

Given the above-mentioned probabilities, E[A] 8 5 and E[B] =5, and the expression for 
computing expectations for patterns of the form ANB. 
E[ANB] = Pr(A]PitB](N-l) 
(0.25K025K19) 
15 = 1.1875 

~ Since~A[A]^-and-AlB]^rboth-^ 

actual number occurrences of these events was less than what was expected), but the 
pattern ANB which occurred 4 times was interesting with 

™<-*m - Tigs 

20 = 3.37 

This lack of monotonicity in the interestingness measure can result in a 
significantly more complex problem, specifically in terms of space complexity. In the 
exemplary technique for discovering frequent patterns, significant pruning of the 
search space may occur with each iteration. That is, when a newly discovered pattern 

25 is found to have occurred fewer times than the frequency threshold, it may be 
discarded as adding new events to it, and thus cannot result in a frequent pattern 
(which is not the case using the interestingness measure). The addition of an event to 
an uninteresting pattern can result in the discovery of an interesting pattern being 
created. This inability to prune the discovered patterns leads to a large increase in the 

30 amount of space required to find unexpected patterns. 
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Another exemplary technique of die method according to the present invention 
for finding unexpected patterns involves sequential scans over the string of events 
discovering new patterns with each scan is illustrated in Figure 4B. To summarize 
this exemplary technique, a list is maintained of those patterns that were discovered 

5 previously, and on each subsequent iteration of this technique, the "best" pattern is 
selected Scorn this list for expansion to be the seed for the next scan. Described below 
is an exemplary method to determine which pattern is the "best" pattern. 

The "best" pattern can be defined as a pattern that is most likely to produce an 
interesting pattern during the expansion. By expanding the already interesting 

10 pattern, it is possible, and even likely, to discover additional interesting pattern(s). 
However, it should still be determined which is the best candidate for die expansion 
among interesting patterns already discovered If no interesting patterns remain 
unexpended, it is determined whether there any uninteresting patterns worth 
expanding. « 

15 According to this exemplary embodiment of the present invention, input 

string(s)/sequence(sX355,_event.probabilities 356, a threshold T for the 
interestingness measure 357, a number for a maximum allowable pattern length 
("MAXL") 358 and a value "MIN_TO_EXPAND rt 359 are provided to the system 10. 
The MINjrOJBXPAND value is preferably the minimum threshold of expected 

20 interestingness that the pattern should have in order to become the next pattern. Then, 
a scan of the input string(s)/sequence(s) takes place, in which the number of 
occurrences (and therefore, the frequencies) of individual events are counted to 
determine the interestingness and location of each event (step 360). This scan (e.g., a 
linear scan) is a scan of the "DL W events that occur in the record string(s)/sequence(s), 

25 where "D" is the number of data records and "L" is the number of fields in each data 
record. 

In step 365, the list of patterns is initialized with the set of discovered patterns. 
For example, certain R lists should be initialized at this stage, where R is the number 
of temporal operators that are used. Each list may represent the pattern form X, 
30 where X is an arbitrary literal. One sorted list can be stored for each temporal 
operator. The processing time and capacity preferable for this initialization 
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corresponds to the processing time and capacity of sorting these lists. Initially, all 
lists can be sorted in an identical order. Therefore, the total processing time and 
capacity of this initialization may be defined by O (N log N), where N is the number 
of distinct events in the database. Each literal a, in each list, has an initial candidacy 
5 value of: 

A(a] 
P[a] 

10 where A[a] is the number of occurrences of a which can be determined in the initial 
scan. 

Then, in step 370, the suffix lists are initialized. For example, the "R" lists axe 
preferably initialized at this stage, where R is the number of temporal operators that 
can be predefined or defined by a user. Each such list contains fee potential s uffix es 
15 for all length 2 patterns. Each of these lists would again be sorted based on their 

candidacy values. Initially, these candidacy values are the same as those for the set of 
discovered patterns (described above for step 465), and therefore no additional sorting 
is necessary. The total processing time and capacity of tins initialization can be 
defined as 0(N). 

20 In step 375, the pattern locations are calculated. As described above, it its 

possible to compute the locations of the pattern resulting from combining the pattern 
P with a literal or via the operator "Op" via the linear scan of the location lists for the 
pattern P and the literal a. The total number of operations that should be performed 
for this computation is proportional to the longer of these two location lists. This has 

25 an expected value o£ 
DR 

N 

where D is the number of data records, R is the number of temporal operators, and N 
30 is the number of distinct events in the database. 

Then, the already discovered patterns are updated in step 380. Given that the 
locations of the candidate P Op a have been previously computed, this step entails 
two substeps. In the first substep, the newly discovered patterns are inserted into the 
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the printer 30, the display device 40, the storage device 50 and/or the communications 
network 60, and the processing of this exemplary method is completed. 

For example, the exemplary technique described above with reference to 
Figure 4B continues to expand best candidates of the unexpected patterns until there 
5 ' are no more candidates that are worthy of expansion To further explain this concept, 
the following definitions can be utilized: 

Definition I: The FORM(P) of the pattern P is a logical expression with all ground 
terms in the pattern P replaced by variables. For example, if 
P=aNPB K YBic6, then FORM(P) -WNXB K YB^ 
10 Given the length of the input string(s)/sequence(s), it is possible to 

determine the number of patterns of each form in the input 
string/sequence. For example, given a string of length M, the number 
of patterns of form XNY is M-l . The number of patterns XB K Y is 
(M-K)K + ((K)(K-iy(2)). 

15 Definition II: Given the pattern P and an operator Op, Actual Re maining (P Op X) is 
the number of patterns of the form P Op X that have yet to be 
expanded. This value is maintained for each operator Op and the 
pattern P. That is, a value for PNX; PBKX; XBKP, etc. is maintained 
X ranges over all events. For example, if there are 20 occurrences of 
20 P=aBicJJ in the input string and 5 patterns of the form aBxfJNX have 

been discovered so far, then Actual Remaining Next aB{J3NX=15. 
The following heuristic can be used to determine which discovered pattern is 
the best pattern to use for the expansion. Given an arbitrary literal D, the best pattern 
P for the expansion is preferably the pattern for which the value of 
25 E[[A(POpS]/B[POp6]]i8themaximumforsomeS. 

This heuristic can be a probabilistic statement that the pattern P (which is most likely 
to result in the discovery of an interesting pattern) is the pattern for which there exists 
a literal 6. In particular, the expected value of the interestingness measure of the 
pattern generated when the literal 5 is added to the pattern P via one of the temporal 
30 operators Op is the highest over all discovered patterns P, literals 6 and operators Op. 
It is preferable to use the expected value of the interestingness measure because. 
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although the actual number of occurrences of both the pattern P and the literals 5 is 
known, the number of occurrences of P Op 5 is not known. This expectation is 
computed preferably directly from the previously-described derivations of 
expectations, and can be described using the following example: 
5 IfP=aNp,andOpis 4< nexr, 
thenE[AfPN51^E[PN6]] 

= (#P N )«5))=Pi{a]PitP]Pif6](K^ 
where, K = length of input string, 

FR(6) «= frequency of the literals' 5 that could complete the 
10 pattern _N_NX, and 

#PN = number of occurrences of the pattern P yet to be 
expanded via the operator N. 

If Op is "before", then 

B[A[PBk6]=B[PBk5]] 
15 - ((#P)(FR(6))e*EF0R^^ 

- «#PXFR(6)))^a]Prt^5](K-2) 
where BBFOREK is a user defined variable that is equal to the 
maximum distance between two events X and Y for XB^Y to hold 
Similar arguments can be used for any combination of the operators Op of "before", 
20 ''next", and "untir. In general, the candidate pattern P, the suffix, the literal 5 and the 
operator Op are chosen whose combinations are most likely to result in the discovery 
of the interesting pattern. - — 

Throughout the above-described technique and with reference to Figure 4B, 
two data structures should be used to efficiently compute best candidates on each 
25 subsequent iteration. 

a. An ((N+l)xM) matrix where N is the number of distinct events, and Mis the 
number of different pattern forms that are intended to be discovered. For 
example, M can be very large. However, it is preferable to limit the length of 
the patterns to approximately 5 (depending on the application), taking into 
30 consideration that the infrequency of much larger patterns typically makes 

than statistically insignificant With the maximum pattern length set to 5 and 
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using four temporal operators N, Bk,U, and A , the value of 

m = Z^4 [ = 4— — — = 1 364 , which is a manageable number. 

The structure of this matrix can be follows: each entry {i,j]ie 1...NJ e 
1...M represents the remaining number of yet-to-be-discovered patterns 

5 having the form j whose final event is i. This number can be easily maintained 

because it is the total number of occurrences of the event i minus the number 
of already discovered patterns of the form j whose final event is i. The 
additional (N +1) row contains the total number of already discovered patterns 
(i.e., the sum of the values in the columns) of the form j. Each column of this 

10 array can be sorted such that literal a precedes 3 in the column j if the number 

of the literals a remaining to be added as suffixes to create patterns of the 
form j divided by Pr[a], exceeds that value for die literal a. This value can be 
called the "candidacy value" of the corresponding literal for the corresponding 
pattern form. The matrix can be called the "suffix matrix". 



15 b. The second data structure is an array of MxR lists where M is again number of 
different pattern forms that should be discovered and R is the number of 
temporal operators being used. In list jop, all patterns of the form j that have 
already been discovered are maintained in a sorted order by die number of the 
occurrences of each pattern yet to be expanded through the use of the operator 

20 Op*divided~by EpP]r'This value can be called the corresponding pattern's 

"candidacy value" for the corresponding operator. Such value is simple to 
calculate since the total number of patterns that have the form P Op X is 
known. Along with each pattern, it is possible to maintain the number of 
occurrences of the given pattern P, and the locations of the pattern P. This 

25 array can be termed the "set of discovered patterns/ 9 

The best combination of an element from each of these two data structures 
may be the candidate for the next discovery iteration. For example, at each iteration, it 
is possible to assume that the first value in each list in the set of discovered patterns of 
whose length is less than the maximum allowed pattern length corresponds to the 

30 patterns P u P2.. . Pm. Additionally, it is possible to assume that the first value in 
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each column in the suffix matrix may correspond to the literals cri, a*, . . a M . The 
M values that result from multiplying the candidacy value are computed for each of 
these patterns Pi times the first value in die suffix matrix for the pattern form that is 
the result of combining the pattern Pi from the set of discovered patterns with the 
5 literal a via the operator Op corresponding to the operator for the list from which the 
pattern P was taken. The pattern Pi, literal oij and operator Op can be selected whose 
combination results in the largest value among these M values. In doing so, the goal 
of selecting the candidate pattern, literal, and operator whose combination is most 
likely to result in the discovery of an interesting pattern can be accomplished Once 

10 these candidates have been selected, the determination of the number of occurrences 
of the pattern P| Op ctj can be computed via linear scans of the location lists for the 
pattern Pi and the literal aj. For example, if Op^N, then it is possible to look for 
locations 1 such that P| occurs at the location 1 and aj occurs at location 1+1. If Op « A , 
it is possible to look for the locations where both Pi and GCj occur. One of the ways to 

15 initiate the above-described procedure is by choosing the variable triple{i.e., pattern, 
literal, operator) whose combination would most likely result in the discovery of an 
interesting pattern. As die procedure progresses, if the given pattern P has not 
generated many newly discovered patterns as a candidate for the expansion, the 
pattern will preferably percolate toward the top of its associated sorted list. Likewise, 

20 if a literal a has not been used as the suffix of many discovered patterns, the literal 
will percolate to the top of its suffix list. In this way, as patterns and literals become 
more likely to generate an interesting pattern, via die combination, and they will 
become more likely to be chosen as candidates for the next iteration. 

EL CLASSIFICATION OF DATA 
25 Turning back to die method of the present invention illustrated in Figure 2, the 

data obtained in the feature selection step is classified (step 1 10). The classification 

of data has been problematic to those having ordinary skill in die art of data mining. 

The most widely utilized classification technique entails the use of decision trees. 

There are more powerful classification techniques (in the sense that the decision trees 
30 are able to represent a more robust class of functions) such as neural networks. 

However, those having ordinary skill in the art often do not use the neural networks 
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for classifying <*ata because the neural networks are computationally complex, and 
lack transparency. One of the important features of a classifier is that the resulting 
function should ultimately be understandable. It is preferable to understand why a 
prediction made by the classifier was made to better understand relationships that 
5 exist in current problem domain. The neural networks are a black box, and while 
their predictions may be accurate, they lead to little insight about die problem at hand 

The present invention uses an alternative technique known as "MARS" 
(Multivariate Adaptive Regression Splines). The detailed description of MARS is 
described in, e.g., in J. Friedman, 'Multivariate Adaptive Regression Splines 9 *, 

10 The Annals of Statistics, Vol. 19, No. 1, 1991 pp. 1-141. MARS is a nonlinear 

technique that overcomes many of the shortcomings of standard decision trees while 
being computationally tractable and ultimately inteipretable. 

Although the recursive partitioning may be the most adaptive of the methods 
for multivariate function approximation, it suffers from some significant restrictions 

15 that limit its effectiveness. One of the most significant of these restrictions is that the 
approximating function is discontinuous at the subregion boundaries {as defined by 
splits in the nodes). It severely limits the accuracy of the approximation, especially 
when the true underlying function is continuous. Small perturbations in the predictor 
variables can potentially result in widely varying predictions. Additionally, decision 

20 trees are poor at approximating several simple classes of functions such as linear and 
additive functions. The records obtained from the feature selection are augmented by 
a set of temporal features. For example, from the data records having 9 features to the 
data having 200 features (i.e., a high dimensional data). 

Since all classification techniques generate models based on in sample data 

25 that are designed to perform well on out of sample data and because of the resultant 
high-dimensionality, the issue of over-fitting may occur as described below. 

IIL BIAS INCREASE VIA SHRINKAGE 

In all classification techniques, the introduction of additional degrees of 
freedom reduces the in sample error (bias) of the model while increasing the model 
30 variance. This frequently results in poor approximations of out of sample data. To 
address this problem, some classification methods include a technique for reducing 
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the model bias, typically via a reduction in the classification model's degrees of 
freedom This reduction in degrees of freedom increases bias in the classification 
model, while reducing its variance and out-of-sample error. 

Hie combination of the forecasts can be done by averaging resulting in a 
5 maximum likelihood estimator ("MLE"). To evaluate the applicability and usefulness 
of this approach, it is possible to consider the more general situation of trying to 
estimate a parameter 6 by t(x). For example, if Eftx)]^, then t(x) can be an 
unbiased estimator of 8 and a measure of the precision of this estimator may be 
E[t(x)-©] 2 , i.e., its variance. Instead, if E[i(x)]+Q 9 then t(x) is a biased estimator of 
10 6. A measure of its precision is still E[t(x}-8] 2 , but because £(t(x)] * 6, this quantity 
is not the variance, and known as the mean squared error. Thus, 
E([t<x)-e] 2 ) =E([Kx)-E[t(x)]+E[t(x)]-e] 2 ) 

« E([t(x)-E[t^^^ E{t(x)-Wx)]] 
= E([t(x)-E[t(x)]] 2 )+^[t(i)]-e]) 2 
15 »vai(t(x)]+<E(t(x)]-e]) 2 

= var[t(x)]+[Bias(t)] 2 
By sacrificing an increased bias for a decreased variance, it is possible to 
achieve a uniformly-smaller MLE. Stein's estimator, now known as Stein shrinkage, 
described in B. Efron et aL, "Stein's Estimation Rule and its Competitois-An 
20 Empirical Bayes Approach", Journal of the American Statistical Assoc., Vol. 68, 

Maich-1973rpp-H-7-130rwas originally developed for the case of reducing bias in 

linear functions. The results of the Stein's estimator can be extended for die nonlinear 
case. For example, by "shrinking" the estimated parameters towards the sample 
mean,' this approach mitigates the effects of non-stationarity by reducing the impact of 
25 deviations in the distributions of the estimator variables between in-sample and out- 
of-sample data. 

Thereafter, in step 130 of Figure 2, the prediction model is generated from the 
data records on which feature section was performed, and/or which were classified 
and then shrunk. Finally, in step 140, such prediction model and/or the classified and 
30 reduced data are output to the printer 30, display device 40, storage device 50 and/or 
communications network 60. 
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IV. EXEMPLARY APPLICABILITY OF THE PRESENT INVENTION 

The system and method according to the present invention can be used in two 
exemplary settings, e.g., a network intrusion detection and a disease classification. 
Embodiments of the present invention for each of these exemplary settings are 
5 discussed below. 

A. Network intrusion detection addresses the problem of detecting intrusions on 
a computer network. In summary, the training data may consists of a set of TCP/IP 
records that have been scored 0/1 depending on whether that connection was part of 
an attack as well as with the specific attack type. The intrusion detection system then 

10 learns features that distinguish normal from malicious network activity. These 

features then become the input to a classifier which when run on out-of-sample data 
scores each record based on the likelihood that it is part of an attack. Finally, the third 
stage is to combine the classifiers that result from training on many in sample training 
sets as well as to mitigate the problems of over-fitting and non-stationarity. 

15 Figure 5 shows one such exemplary intrusion detection system ("EDS") 400 

according to the present invention. First, data is collected in the form of log-files that 
consist of a sequence of records about activity on the network. The log files -can be 
collected via a local area network 440 from an information server 410, an attached 
firewall 420, user workstations 430 and/or other sites. One record can be created for 

20 each connection that occurs. The information in each record may include time and 
date of the connection, the type of service provided, the source and destination ports, 
the source and destination IP addresses, the duration of the connection, and other data. 

The IDS 400 described above serves two purposes, e.g., data collection and 
network activity monitoring, and intrusion identification. In serving these roles, the 

25 IDS 400 may include or be connected to a large database (e.g., the storage device 50) 
for data storage, and a computational engine for scoring individual network activity 
records based on their likelihood of being part of an intrusion. In the training phase 
and as illustrated in Figure 6, the IDS accumulates the data generated at the various 
monitoring points on the network (step 500). The aggregated data records are then 

30 scored manually, e.g., with a score of 1 indicating that the given record was part of a 
network attack and a score of 0 indicating that the record was part of normal activity. 
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This exemplary embodiment of the present invention may use, e.g., scored data 
generated by the Defense Department's Advance Research Project Agency (DARPA). 
Once collected, this data becomes the input to the IDS 400, as shown in further detail 
in Figure 7. The initial set of data records represents the input to the first stage of the 

5 technique, i.e., feature selection. In this stage a set of additional features {typically 
several hundred) are generated and added to each data record (step 510). The first set 
of data records are set as the current data records in step 520, and the current data 
records are input to the second stage of the technique - classification, i.e., MARS. 
MARS generates a functional model of the data capable of predicting intrusions on 

10 out-of-sample data based on the current data records (step 530). Then, in step 540, it 
is determined whether all sets of the modified data records were utilized. If not, in 
step 550, the next set of the modified data records is set as the current set of records, 
and the process returns to step 530 so that a number of functional models are 
generated. This set of models is then input into the final stage of the technique, i.e., 

1 5 shrinkage. Shrinkage results in the generation of a single model based on the 

aggregation of all of (he predictor models generated (step 560). This is done in a way 
to mitigate the effects of non-stationarity in the data. This final model is then 
incorporated into the IDS 400. In the IDS 400, the model monitors network activity, 
identifying activity that is part of an (attempted) intrusion on the network. 

20 Concurrently, the IDS 400 may accumulate data records generated by the network 
monitors for use as future training data to the model. This allows the system and 
method of the present invention to continuously update itself based on changes in the 
types of activity occurring on the network. 

B. In the disease classification, the main focus can be on cancer. Given that 
25 cancer results from changes in the DNA of healthy cells, the present invention 

provides an approach to cancer classification based on the gene expression. Both the 
cancer classification problem as well die class discovery problem are addressed by 
identifying discrepancies in gene expression between healthy and cancerous cells. It 
is then possible to evaluate the quality of the approach of the system and method 
30 according to the present invention to cancer classification by considering RNA 

samples from both healthy individuals as well as samples from patients from multiple 
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known cancer classes as identified by their histopathological appearance for 
accurately and consistently validating the diagnosis made by hematopathologists on 
the genetic grounds. This is achieved by training the system (as described below) on 
RNA samples that are properly labeled by their cancer class (or labeled as being 
5 healthy). By discovering the genetic differences among cancer classes, a predictive 
model of theses classes is generated which can then be tested via cross validation and 
through testing on out of sample data, and a class discovery can be performed. For 
example, the system is trained on the same RNA samples. This time, however, these 
samples are unlabeled The classes associated with each sample are discovered 

10 without a prior knowledge of this information. Additionally, novel classes within 
these samples are discovered. 

As shown in Figure 8, healthy DNA and cancerous DNA can-each be dyed . 
different colors and hybridized on a micro-array containing thousands of -genes 
expected to be relevant to cell growth and regulation. Through this process, the 

IS expression levels of these targeted genes can be compared between the healthy and 

cancerous cells.— The cancer classifier then constructs a model capable of classifying 

future DNA samples as either healthy or cancerous. Additionally, DNA samples from 
two different cancer types can be hybridized and a model constructed that identifies 
the cancer type of an out-of-sample, cancerous DNA strand Through this process, 

20 the system is first capable of determining whether or not a DNA sample is cancerous, 
and if it is then identifying the associated cancer type. These results improve the 
targeting of treatment to specific cancer types. Described below is a description of 
how to distinguish between healthy and cancerous DNA, although the process may 
not be identical for identifying specific cancer types. 

23 The data collected from the micro-array is a set of gene expression levels for • 

both normal and cancerous DNA in thousands of different genes. Once collected, this 
data becomes the input to the cancer classification system (CCS) (see diagram below). 
As shown in Figure 9, the set of expression levels represents the input to the first 
stage of the method and system according to the present invention, i.e., feature 

30 selection. In this stage a set of features (typically several hundred) are generated. 
These features represent relevant relationships between the expression levels of 
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different genes in teims of their ability to distinguish healthy from cancerous DNA. 
An example of a potential feature is, e.g., ExpressionLevel(Gene#32) > T AND 
ExpressionLevel(Gene#656) > T. This feature provides that both the expression levels 
of gene number 32 and number 656 exceed some threshold, and may be included if it 

5 represented a situation that is either highly correlated with healthy or highly correlated 
with cancerous DNA. Thus, such features are input to the second stage of the 
technique, i.e., MARS. MARS generates a functional model of the data capable of 
distinguishing between healthy and cancerous DNA on out-of-sample data, This 
process is typically executed several times on different training data sets, thus 

10 generating several models. This set of models is then input into the final stage of the 
technique, Le., shrinkage. Shrinkage results in the generation of a single model 
based on the aggregation of all of the predictor models generated. The combination 
of models is particularly relevant to cancer classification when attempting to build a 
model that differentiates between several cancer types. Models are initially 

1 5 constructed to distinguish between pairs of cancer classes. Shrinkage then combines 

these models tocreate a single monolithic classifier -capable ofdistinguishing between 

many different cancer classes. 

One having ordinary skill in the art would clearly recognize that many other 
domains and applicable example in which data is temporal and/or non-stationary in 

20 nature can benefit using this system and method for classification according to the 
present invention. Indeed, the present invention is in no way limited to the exemplary 
applications and embodiments thereof described above. 
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CLAIMS 

1 . A method for determining a feature of a particular pattern, comprising 
the steps of: 

a) receiving data records; 
5 b) obtaining predetermined patterns that are associated with at least some 

of the data records; 

c) extracting particular information from at least a subset of the received 
data records, the particular information being indicative of the particular pattern for at 
least some of the data records; and 
10 d) determining whether the particular pattern is an unexpected pattern 

based on the obtained predetermined patterns. 

2. The method according to claim 1, wherein at least one record of the data 
records includes temporal data 

3. The method according to c laim 1 , wherein at least one record of the data 

1 5 records includes non-stationary data. 

4. The method according to claim 1, wherein step (b) comprises die substeps o£ 

i. assigning a threshold, and 

ii. correlating the data records into sets of patterns as a function of 
the threshold 

20 5. The method according to claim 4, wherein step (d) includes the substep of 
deter minin g if the particular pattern corresponds to at least one pattern of the sets of 
patterns. 

6 The method according to claim 5, wherein the unexpected pattern is 
established if the particular pattern does not correspond to any pattern of the sets of 
25 patterns. 



7. The method according to claim 1, wherein the unexpected pattern is indicative 
of an interestingness measure in the predetermined pattern. 
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8. The method according to claim 1, wherein the data records include input 
sequences, and wherein step (d) comprises the substep of scanning the input 
sequences to determine an interestingness measure of at least one event in the input 
sequences. 

5 9. The method according to claim 8, wherein step (d) comprises the substeps of: 
L initializing a pattern list by inserting all events of the input 
sequences therein, and 

ii. from all patterns in the pattern list, selecting a first pattern 
which has a largest interestingness measure. 

10 10. The method according to claim 9, wherein the data records include a 

maximum allowable length value, and wherein step<d) comprises the substeps 
of: 

iii. expanding the first pattern to be a second pattern, 

iv. if a length of the second pattern is greater than the maximum 
_15 allnwflhl* valine wHHmg the second pattern to the pattern list, 

and 

v. if a length of the second pattern is less than or equal to the 
maximum allowable value, subtracting the first pattern from the 
pattern list 

20 11. The method according to claim 10, wherein step (d) comprises the substep of 
repeating substeps (iiH v ) un *fl ^ e pattern list becomes empty. 

12. The method according to claim 1 1 , further comprising the step of: 

e) outputting the particular pattern which includes the interestingness 
measure. 

25 13. The method according to claim 8, wherein step (d) comprises the substeps of: 

i. initializing a pattern list by inserting all events of the input 
sequences therein, 

ii. initializing at least one suffix list, 



WO 01/88834 PCTAJS01/15140 

-30- 

iii. calculating locations of certain patterns of the input sequences, 
iv updating previously discovered patterns based on the calculated 
locations, and 

v. updating the at least one suffix list using the certain patterns. 

5 14. The method according to claim 13, wherein the data records include a 

maximum allowable length value, and wherein step (d) comprises the substep 
oft 

vi. if a length of the second pattern is greater than or equal to the 
maximum allowable value, repeating substeps (iii)-(v). 

10 15. The method according to claim 1 , wherein step (d) includes the substep of: 
i. generating further records by modifying the data records to 
include additional features. 

16. The method according to claim IS, further comprising the step of: 
(f) generating a functional model using the further records. 

15 17. The method according to claim 16, wherein substep (i) includes generating a 
plurality of sets of the further records, and wherein step (f) is executed for each set of 
the further records. 

18. The method according to claim 17, wherein step (f) includes the substep of 



generating a single model based on each functional model of the respective set of the 
20 further records. 

19. The method according to claim 1, further comprising the steps of: 

(g) after step (d), classifying the data records which have the unexpected 
pattern associated therewith; and 

(h) generating a prediction model as a function of die classified data 
25 records. 



20. The method according to claim 19, wherein step (g) is performed using a 
Multivariate Adaptive Regression Splines technique. 



WO 01/88834 



PCT/US01/15140 



-31- 



2 1 . The method according to claim 19, further comprising the step of: 

(i) shrinking at least one of data and parameters of the classified data 
records. 

22. The method according to claim 21, wherein step (i) includes the substep of 
5 determining a mean of the at least one of the data and the parameters. 

23. The method according to claim 21, wherein step (i) is performed using a 
Stein's Estimator Rule technique. 

24. The method according to claim 1, wherein at least one of the predetermined 
patterns utilizes temporal modal operators. 

10 25. The method according to claim 1, wherein at least one of the predetermined 
patterns utilizes logical connectives. 

26. The method according to claim 1, wherein at least one of the predetermined 
patterns is generated by a compute: pro gram. 

27. A system for determining a feature of a particular pattern, comprising: 
15 a processing arrangement programmed to: 

a) receiving data records, 

b) obtaining predetermined patterns that are associated with at least some 

k 

of the data records, 

c) extracting particular information from at least a subset of the received 
20 data records, the particular information being indicative of the 

particular pattern for at least some of the data records, and 

d) determining whether the particular pattern is an unexpected pattern 
based on the obtained predetermined patterns. 

28. The system according to claim 27, wherein at least one record of the data 
25 records includes temporal data. 

29. The system according to claim 27, wherein at least one record of the data 
records includes non-stationary data. 
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30. The system according to claim 27, wherein, in step (b), the processing 
arrangement: 

i. assigns a threshold, and 

ii. correlates the data records into sets of patterns as a function of 
5 the threshold. 

3 1 . The system according to claim 30, wherein, in step -(d), the processing 
arrangement determines if the particular pattern corresponds to at least one pattern of 
the sets of patterns. 

3 2 The system according to claim 3 1 , wherein the unexpected pattern is 
10 established if die particular pattern does not correspond to any pattern of the sets of 
patterns. 

33. The system according to claim 27, wherein the unexpected pattern is 
indicative of an interestingness measure in the predetermined pattern. 

34; The system according to claim 27,-wherein the data records include input 

IS sequences, and wherein step (d) comprises the substep of scanning the input 

sequences to determine an interestingness measure of at least one event in the input 
sequences. 

35. The system according to claim 34, wherein, in step (d), the processing 
arrangement: 

20 i. initializes a pattern list by inserting all events of the input 

sequences therein, and 

ii. from all patterns in the pattern list, selects a first pattern which 
has a largest interestingness measure. 

36. The system according to claim 35, wherein the data records include a 
25 maximum allowable length value, and wherein, in step iff), the processing 

arrangement: 

iii. expands the first pattern to ^e a second pattern, 
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iv. if a length of the second pattern is greater than the mfmmurn 
allowable value, adds die second pattern to the pattern list, and 

v. if a length of the second pattern is less than or equal to the 
maximum allowable value, subtracts the first pattern from the 

5 pattern list 

37. The system according to claim 36, wherein, in step (d), the processing 
arrangement repeats substeps (ii)-(v) until the pattern list becomes empty. 

38. The system according to claim 37, wherein the processing arrangement is 
further programmed to; 

10 e) output the particular pattern which includes the interestingness 

measure. 

39. The system according to claim 34, wherein, in step (d), the processing 
arrangement: 

i. initializes a pattern list by inserting all events of the input 
^ sequraces therrin, ~ 

ii. initializes at least one suffix list, 

iii . calculates locations of certain patterns of die input sequences, 
iv updates previously discovered patterns based on die calculated 

locations, and 

20 v. updates the at least one suffix list using the certain patterns. 

40. The system according to claim 39, wherein the data records include a 
maximum allowable length value, and wherein, in step (d), the processing 
arrangement: 

vL repeats substeps (iii)-(v) if a length of the second pattern is 
25 greater than or equal to the maximum allowable value. 



41. The system according to claim 27, wherein, in step (d), the processing 
arrangement: 
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L generates further records by modifying the data records to 
include additional features. 

42. The system according to claim 41 , wherein the processing arrangement is 
further programmed to: 

5 (f) generates a functional model using the further records. 

43. Hie system according to claim 42, wherein, in substep (i), the processing 
arrangement generates a plurality of sets of the further records, and wherein the 
processing arrangement executes step (f) for each set of the further records. 

44. The system according to claim 43, wherein, in step (f), the processing 
10 arrangement generates a single model based on each functional model of the 

respective set of the further records. 

45. The system according to claim 27, wherein the processing arrangement is 
further programmed to: 

- (g) after step-(d)rclassify the data records which have the unexpected 

15 pattern associated therewith, and 

(h) generate a prediction model as a function of the classified data records. 

46. The system according to claim 45, wherein the processing arrangement 
performs step (g) using a Multivariate Adaptive Regression Splines technique. 

47. The system according to claim 45, wherein the processing arrangement is 
20 further programmed to: 

(i) *hrmV at least one of data and parameters of the classified data records. 

48. The system according to claim 47, wherein, in step (i), the processing 
arrangement determines a mean of the at least one of the data and the parameters. 

49. The system according to claim 47, wherein the processing arrangement 
25 performs step (i) using a Stein's Estimator Rule technique. 
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50. The system according to claim 27, wherein at least one of the predetermined 
patterns utilizes temporal modal operators. 

5 1 . The system according to claim 27, wherein at least one of the predetermined 
patterns utilizes logical connectives. 

5 52. The system according to claim 27, wherein at least one of the predetermined 
patterns is generated by a computer program. 

53. A method for classifying and reducing at least one of data and parameters 
provided in the data records, comprising the steps of: 
a) receiving data records; 
10 b) classifying the data records which have at least one particular pattern, 

the data records being classified using a Multivariate Adaptive Regression Splines 
technique; and 

c) fiiiHttiring the at least one of the data and the parameters of the 
classified data records using a Stein's Estimator Rule technique. 

15 54. The method accoiding to claim 53, further comprising the steps of: 

d) obtaining predetermined patterns that are associated with at least some 
of the data records; 

e) extracting particular information from at least a subset of the received 
data records, the particular information being indicative of the at least one particular 

20 pattern in at least some of the data records; and 

f) determining whether the at least one particular pattern is an unexpected 
pattern based on the obtained predetermined patterns. 

55. A system for classifying and reducing at least one of data and parameters 
provided in data records, comprising: 
25 a processing arrangement programmed to: 

a) receive the data records, 
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classify the data records which have at least one particular pattern, the 
data records being classified using a Multivariate Adaptive Regression 
Splines technique, and 

shrink at least one of data and parameters of the classified data records 
using a Stem's Estimator Rule technique. 

56. The system according to claim 55, wherein the processing arrangement is 
further programmed to: 

d) obtain predetermined patterns that are associated with at least some of 
the data records, 

1 0 e) extract particular information from at least a subset of the received data 

records, the particular information being indicative of the at least one 
particular pattern in at least some of the data records, and 
f) determine whether the at least one particular pattern is an unexpected 
pattern based on the obtained predetermined patterns. 



b) 
c) 
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to the threshold value 140. If the "final score" 270 is equal to or greater 
than the threshold value 140 obtained during enrollment, the user is 
verified. If the "final score" 270 is less than the threshold value 140 then 
the user is not verified or permitted to complete the transaction requiring 
verification. 

The present invention also employs a number of additional 
adaptations, in addition to channel adaptation 180. 

As previously described, the multiple classifier system uses a 
classifier fusion module 130, 260 incorporating a fusion function to 
advantageously combine the strength of the individual classifiers and 
avoid their weakness. However, the fusion function that is set during the 
enrollment may not be optimal for the testing in that every single 
classifier may have its own preferred operating conditions. Therefore, as 
the operating environment changes, the fusion function changes 
accordingly in order to achi eve th e optimal results for fusion. Also, for 
each user, one classifier may perform better than the other. An adaptable 
fusion function provides more weight to the better classifier. Fusion 
adaptation uses predetermined knowledge of the performance of the 
classifiers to update the fusion function so that the amount of emphasis 
being put on a particular classifier varies from time to time based on its 
performance. 

As shown in Figure 2, a fusion adaptation module 290 is connected 
to the classifier fusion module 280. The fusion adaptation module 290 
changes the constant, a, in the linear pool data fusion function described 
previously with respect to Figure 2, which is: 
n 

S(a) = ZaiSi 
i=l 

In the present invention two classifiers are used (NTN 80, 220 and 
GMM 90, 230) and s x is the score of the first classifier and s 2 is the score of 
the second classifier. In this instance the equation becomes: 
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S = 4- (l-a)s 2 

The fusion adaptation module 290 dynamically changes a to weigh 
either the NTN (sj) or GMM (S2) classifiers more than the other, 
depending on which classifier turns out to be more indicative of a true 

5 verification. 

The fusion adaptation module 290 is shown in Figure 8. The first 
step of fusion adaptation is to determine whether the fusion adaptation 
criteria are met 500. The fusion adaptation criteria are met in any number 
of circumstances, which may be dependent on the type of voice 

10 verification system being implemented in a particular application. For 

example, the fusion adaptation criteria may be met in the following cases: 
after every five (or another predetermined number of) successful 
verifications, if the scores of the classifiers (i.e. the GMM score and the 
NTN score) differ by more than a predetermined amount, if it is f ound 

15 that the true user was not verified for a predetermined number of 
attempts (false-negative results), if it is found that an imposter was 
verified for one or more attempts (false-positive results), or during a time 
period (i.e. the first week of use by a particular user). In these cases, the 
system is not working at its optimal efficiency and needs further 

20 adaptation to improve. Because fusion adaptation may effect the amount 
of false-positive results and the amount of false-negative results, the 
inclusion criteria may be made dependent on the amount of tolerance 
which is deemed acceptable for these possibilities. 

As shown in Figure 8, if the inclusion criteria are met, the classifier 

25 closest to the threshold is assessed. Specifically, it is determined whether 
s x is closer to the threshold value than S2510. If s x is closer to the threshold 
than Sj, the constant, a, is increased 520 to provide more weight to s r If 
not, then a is decreased 530 to provide more weight to Sj. The amount 
that a is increased or decreased depends on the particular application, and 

30 may be a constant amount or a variable amount, depending on the 
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amount of error in the system, the amount of tolerance for false-positive 
results, the amount of tolerance for false-negative results, etc... The 
modified constant, a, is then stored 540 in the voice print database 115 for 
use in the testing component 
5 Thus, the weighing of the different classifier models may be 

dynamically changed to adapt the system by changing the fusion constant, 

a. 

Threshold adaptation adapts the threshold value in response to 
prior final scores. Threshold adaptation module 295 is shown in Figure 2. 

10 Figure 9 shows an example of threshold adaptation 600. First, the 

threshold adaptation inclusion criteria are assessed 610. If the inclusion 
criteria are not met, the process ends and no threshold adaptation takes 
place. The inclusion criteria may vary depending on the particular 
application, as described previously with respect to Figure 8 (fusion 

15 adaptation) and Figure 10 (model adaptation). It is also to be noted that 

threshold adaptation 600 may affect the amount of false-positive results 
and the amount of false-negative results. Therefore, the inclusion criteria 
may be made dependent on the amount of tolerance which is deemed 
acceptable for these possibilities. Threshold adaptation 600 analyzes one or 

20 more prior final scores and adapts the threshold in response to the 
analysis. 

With continued reference to Figure 9, after assessing the inclusion 
criteria 610 , one or more previous final scores, which may include the 
present final score, are recalled (if necessary) and analyzed 620. The 
25 analysis may be simple or complex. For example, the analysis may be the 
average or mean of all the successful verifications, or, preferably, the 
analysis may be the average or mean of one or more unsuccessful 
verifications in which it is known that false-negative results were 
obtained. 

30 The new threshold is calculated 630 from this analysis. For 

example, if the average of four unsuccessful verifications in which it is 
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known that false-negative results were obtained is 0.4, then the new 
threshold may be set to 0.3. The 
analyzation 620 and calculation 630 of a new threshold may depend on the 
amount of tolerance which is deemed acceptable for false-negative and 
5 false-positive results. For example, if false positive results are somewhat 
tolerable, then the new threshold may be set to the lowest final score in 
which it is known that a false-negative result occurred. 

After calculating the a new threshold, the new threshold is saved 
640 for use in current or future testing. 

10 Model adaptation adapts the classifier models to subsequent 

successful verifications. Figure 10 shows one example of model adaptation 
540. First, the inclusion criteria for model adaptation are assessed 550. If 
the inclusion criteria are not met, the process ends and no model 
adaptation takes place. The inclusion criteria may vary depending on the 

15 particular application, as described previously with respect to Figure 8 and 
fusion adaptation. It is also to be noted that model adaptation 540 may 
effect the amount of false-positive results and the amount of false- 
negative results because there is a small chance that the successful test 
speech is a false-positive. Therefore, the inclusion criteria may be made 

20 dependent on the amount of tolerance which is deemed acceptable for 
these possibilities. Model adaptation 540 uses the test speech as 
enrollment speech, and retrains the classifier models 80, 90, 220 and 230 
with the additional data sample (test speech) in a re-enrollment process 
that is transparent to the user. Therefore, one of the inclusion criteria is 

25 that verification is successful for each test speech used in model 
adaptation. 

With continued reference to Figure 10, after assessing the inclusion 
criteria 550, the number of samples and their corresponding enrollment 
speech is identified 560, or recalled from the voice print database 115 if 
30 necessary. The previously stored enrollment speech, extracted features, 
and segmentation (subword) information is recalled from the voice print 
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database 115, along with previous successful test speech, and its associated 
extracted features. 

For example, the previous four test speech samples in which 
successful verification occurred may be recalled from the voice print 

5 database 115, as well as the four initial training samples of enrollment 

speech. This doubles the number of training samples from four to eight. 
In order to limit the number of training samples, a "forget" factor may be 
built into the system, the forget feature may discard one or more samples. 
For example, only the latest eight samples may be remembered, or only the 

10 initial four enrollment speech samples and the newest four successful test 
samples. The number of samples, and which samples are used, may 
depend on the tolerance for false-positive results and fake-negative 
results, since the model adaptation will change these probabilities. 

After identifying the number of samples and associated speech 560, 

15 t rai nin g the multiple models occur as previously d escribed with respect to 
Figure 1 A. Therefore the remaining portion of Figure 10 corresponds to 
the multiple classifier model and leave-one-out methodology of Figure 
1A. A new threshold value will be obtained by the retrained model. 
Model adaptation 540, as shown in Figure 10, operates in conjunction with 

20 the c l assif iers sh own in Figur es 1 and 2. 

Model adaptation 540 may also occur as described in copending 

Provisional Application Serial No. , entitled "Model Adaption 

System And Method For Speaker Verification," filed on November 3, 1997 
by Kevin Farrell and William Mistretta. 

25 Model adaptation 540 is useful for adjusting the system to adapt to 

gradual changes in the user's voice over long periods of time. 

Fusion adaptation 290, model adaptation 540, and threshold 
adaptation 600 all may effect the number and probability of obtaining false- 
negative and false-positive results, so should be usfcd with caution. These 

30 adaptive techniques may be used in combination with channel adaptation 
180, or each other, either simultaneously or at different authorization 
occurrences. Model adaptation is more dramatic than threshold 
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adaptation or fusion adaptation, which both provide incremental changes 
to the system. 

The voiceprint database 115 may or may not be coresident with the 
antispeaker database 110. Voice print data stored in the voice print 
5 database may include: enrollment channel estimate, classifier models, list 
of antispeakers selected for training, fusion constant, theshold value, 
normalized segment durations, and/or other intermediate scores or 
authorization results used for adaptation. 

3. "Pootsfrapping" Component. 

10 Because the enrollment component 10 uses the "closest" 

antispeaker data to generate the threshold value 140, the antispeaker 
database 110 must be initially be filled with antispeaker data. The initial 
antispeaker data may be generated via artifical simulation techniques, or 
can be obtained from a pre-existing database, or the database may be 

15 "bootstrapped" with data by the bootstrapping component. 

Figure 11 shows a bootstrapping component 700. The bootstrapping 
component 700 first obtains antispeaker speech 710, and then preprocess 
the speech 720 as previously described with respect to Figure 1 A. The 
antispeaker speech may be phrases from any number of speakers who will 

20 not be registered in the database as users. Next, the antispeaker speech is 
inverse-channel filtered 730 to remove the effects of the antispeaker 
channel as described with respect to Figures 1 and 2. As shown in Figure 
11, the processed and filtered antispeaker speech then undergoes feature 
extraction 770. The feature extraction may occur as previously described 

25 with respect to Figure 1 A. Next, the antispeaker speech undergoes sub- 
word generation 750, using the techniques previously described with 
respect to Figure 1A. The preferable method of sub-word generation is 
automatic blind speech segmentation, discussed previously with respect to 
Figure 1A. The sub-words are then registered as antispeaker data 760 in 

30 the database. 
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Thus, the bootstrapping component initializes the database with 
antispeaker data which then may be compared to enrollment data in the 
enrollment component. 

The present invention provides for an accurate and reliable 

5 automatic speaker verification, which uses adaptive techniques to 

improve performance. A key word/ key phrase spotter 200 and automatic 
blind speech segmentation improve the usefulness of the system. 
Adaptation schemes adapt the ASV to changes in success/failures and to 
changes in the user by using channel adaptation 180, model adaptation 540, 

10 fusion adaptation 290, and threshold adaptation 600. 

The foregoing description of the present invention has been 
presented for purposes of illustration and description which is not 
intended to limit the invention to the specific embodiments described. 
Consequently, variations and modifications commensurate with the 

15 above teachings, and within the skill and knowledge of the relevant art, 

~ are parTof the scope of the'jfresent invention7~It is intended that the 

appended claims be construed to include alternative embodiments to the 
extent permitted by law. 
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CLAIMS: 

1. An automatic speaker verification system comprising: 

a receiver, the receiver obtaining enrollment speech over an 
enrollment channel; 

a means, connected to the receiver, for developing an estimate of 
the enrollment channel; 

a first storage device, connected to the receiver, for storing the 
enrollment channel estimate; 

a means for extracting predetermined features of the enrollment 
speech; 

a means, operably connected to the extracting means, for 
segmenting the predetermined features of the enrollment speech, wherein 
the features are segmented into a plurality of subwords; 

at least one classifier, connected to the segmenting means, wherein 
the classifier models the pluraility of subwords and outputs one or more 
classifier scores. 

2. The automatic speaker verification system of claim 1, further 
comprising: 

an analog to digital converter, connected to the receiver, for 
providing the obtained enrollment speech in a digital format 

3. The automatic speaker verification system of claim 1, wherein at 
least one classifier is a one neural tree network classifier. 

4. The automatic speaker verification system of claim 1, wherein at 
least one classifier is a Gaussian mixture model classifier. 

5. The automatic speaker verification system of claim 1, wherein the 
classifiers comprise: 

at least one Gaussian mixture model classifier, the Gaussian 
mixture model classifer resulting in a first classifier score; and 
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at least one neural tree network classifier, the neural tree network 
classifer resulting in a second classifier score. 

6. The automatic speaker verification system of claim 1, further 
comprising a means, connected to the classifier, for fusing the classifier 
scores, wherein the fusing means weighs the scores from the classifier 
models with a fusion constant and combines the weighted scores resulting 
in a final score for the combined system. 

7. The automatic speaker verification system of claim 6, wherein the 
weighted scores are variable and are dynamically adapted. 

8. The automatic speaker verification system of claim 1, wherein the 
segmenting means generates subwords using automatic blind speech 

segmentation.— 

9. The automatic speaker verification system of claim 1, wherein the 
estimating means comprises a means for creating a filter representing 
characteristics of the enrollment channel. 

10. The automatic speaker verification system of claim 1, further 
comprising a second storage device, connected to the classifier, for storing 
the one or more classifier scores. 

11. An automatic speaker verification method, comprising the steps of: 
obtaining enrollment speech over an enrollment channel; 
storing an estimate of the enrollment channel; 

extracting predetermined features of the enrollment speech; 

segmenting the enrollment speech, wherein the enrollment speech 
is segmented into a plurality of subwords; and 

modelling the pluraility of subwords vising one or more classifier 
models resulting in an output of one of more classifier scores. 
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12 The automatic speaker verification method of claim 11, further 
comprising the steps of: 

digitizing the obtained enrollment speech; and 

preprocessing the digitized enrollment speech. 

5 13. The automatic speaker verification method of claim 11, wherein the 
step of modeling comprises the step of scoring at least one neural tree 
network classifier. 

14 The automatic speaker verification method of claim 11, wherein the 
step of modeling further comprises the steps of: . 
10 scoring at least one Gaussian mixture model classifier, the Gaussian 

mixture model dassifer resulting in a first classifier score; 

scoring at least one neural tree network classifier, the Gaussian 
mixture model classifer resulting in a second classifier score; 

fusing%e~first and second"classifier scores; 

15 15. The automatic speaker verification method of claim 11, further 
comprising the steps of: 

weighing the scores from the classifier models with a fusion 
constant; and combining the weighted scores resulting in a final 
score for the combined system. 

20 16. The automatic speaker verification method of claim 15, wherein the 
fusion constant is variable and is dynamically adapted. 

17. The automatic speaker verification method of claim 11, wherein the 
step of segmenting comprises generating subwords using automatic blind 
speech segmentation. 
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18. The automatic speaker verification method of claim 11/ wherein the 
step of storing an estimate of the enrollment channel comprises the step of 
creating a filter representing characteristics of the enrollment channel. 

19. An automatic speaker verification method, comprising the steps of: 
5 obtaining enrollment speech over an enrollment channel; 

storing an estimate of the enrollment channel, the estimate being a 
filter representing characteristics of the enrollment channel; 

receiving test speech over a testing channel; 

inverse filtering the test speech to create filtered test speech; 
10 recalling the estimate of the enrollment channel 

filtering the filtered test speech through the recalled estimate of the 
enrollment channel to create enrollment filtered test speech; and 

determining whether the enrollment filtered test speech comes 
from the sam e person as the enrollment speech. 

15 20. The automatic speaker verification method of claim 19, wherein the 
step of storing an estimate of the enrollment channel comprises the step of 
creating a filter representing characteristics of the enrollment channel. 

21. The automatic speaker verification method of claim 19, wherein the 
step of inverse filtering the test speech comprises the step of creating a 
20 filter representing inverse characteristics of the testing channel. 



2?, An automatic speaker verification method, comprising the steps of: 
obtaining enrollment speech over an enrollment channel; 
inverse filtering the enrollment speech to create inverse filtered 

enrollment speech; 
25 receiving test speech over a testing channel; 

inverse filtering the test speech to create inverse filtered test speech; 

and 



10 
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determining whether the inverse filtered test speech comes from 
the same person as the inverse filtered enrollment speech. 

23. The automatic speaker verification method of claim 22, wherein the 
step of inverse filtering the enrollment speech comprises the step of 
creating a filter representing inverse characteristics of the enrollment 
channel. 

24. The automatic speaker verification method of claim 22, wherein the 
step of inverse filtering the test speech comprises the step of creating a 
filter representing inverse characteristics of the testing channel. 

25. An automatic speaker verification method, including the steps of: 
obtaining two or more samples of enrollment speech; 
processing each sample of enrollment speech to form corresponding 



utterances; 

obtaining test speech; 
15 identifying one or more key words/key phrases in the test speech, 

including the steps of. 

selecting a reference utterance from one of the utterances; 
. w ~^5^Tthe remaihmg~samples of the enrollment speech to 

the reference utterance; 
20 averaging one or more of the warped utterances to generate a 

reference template; 

calculating a dynamic time warp distortion for the reference 

template and test speech; and 

choosing a portion of the test utterance which has the least 
25 dynamic time warp distortion; and 

comparing the identified key word/ key phrases to the enrollment 
speech to determine whether the test speech and enrollment speech are 
from the same person. 
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26. The automatic speaker verification method of claim 25, wherein the 
step of selecting a reference utterance comprises the step of: choosing the 
utterance with minimum duration. 

27. The automatic speaker verification method of claim 25, wherein the 
5 step of selecting a reference utterance comprises the step of: choosing an 

utterance with median duration. 

28. The automatic speaker verification method of claim 25, wherein the 
step of selecting a reference utterance comprises the step of: choosing an 
utterance with a duration closest to the average duration. 

10 29. The automatic speaker verification method of claim 25, wherein the 
step of selecting a reference utterance comprises the step of: choosing an 
utterance with minimum combined distortion with respect to the other 



utterances. 

30. An automatic speaker verification method, wherein the results of 
15 prior verifications are stored, including the steps of: 

obtaining test speech from a user seeking authorization or 
identification; 

generating subwords of the test speech; 

scoring the subwords against subwords of a known individual using 

r vi-rr ■ . 

20 a plurality of modeling classifiers; 

storing the results of each model classifiers as a classifier score; 

fusing the results of each classifier score using a fusion constant and 
weighing function to generate a final score; and 

comparing final score to a threshold value to determine whether 
25 the test speech and enrollment speech are from the known individual. 



31. The automatic speaker verification method of claim 30, further 
comprising the step of: 
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determining that fusion adaptation inclusion criteria are met; and 
changing the fusion constant to provide more weight to the 
classifier score which more accurately corresponds to the threshold value. 

32. The automatic speaker verification method of claim 30, further 
5 comprising the steps of: 

determining that model adaptation inclusion criteria are met, 
including that one or more verifications have been successful; and 

training the model classifiers with previously stored enrollment 
speech and with speech corresponding to the successful verifications, 
10 including the steps of 

generating a new threshold value; and 
storing the new threshold value. 

33. The automatic speaker verification method of claim 30, further 
comprising the steps of: 

15 determining that threshold adaptation inclusion criteria are met; 

analyzing the stored final scores; 

calculating a new threshold value in response to the analyzation; 



and 



storing the new threshold value. 



20 34. An automatic speaker verification method, comprising the steps of: 
obtaining test speech from a user over a test channel; 
processing the test speech to remove the effects of the test channel; 

and 

comparing the processed test speech with speech data from a known 
25 user, including the steps of: 

extracting features of the test speech; 

generating subwords based on the extracted features; 

scoring the subwords using one or more model classifiers; 
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fusing the results of the model classifiers to obtain a final 
score; and 

verifying the user if the final score is equal to or greater than 
a threshold value. 

5 35. The automatic speaker verification method of claim 34, wherein the 
known speech is obtained over an enrollment channel, wherein the step 
of processing further comprises the step of filtering the test speech through 
a filter having characteristics of the enrollment channel, and wherein the 
step of generating subwords further comprises the step of spotting one or 
10 more key words/ key phrases in the processed test speech. 

36. The automatic speaker verification method of claim 34, further 
comprising the steps of: 

training the model classifiers using antispeaker data from nonusers 
and one or more enrollment speech samples from the user; 
15 changing the model classifiers and threshold value, including the 

step of. 

determining that the user has been verified; 
retraining the model classifiers, including the step of using 
. ~ test speech corresponding the verified final score as an enrollment 

20 sample; 

calculating a new threshold value based on the retrained 
model classifiers. 
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for every frame of speech 




Evaluate the roots zi of the linear prediction (LP) polynomial 


ifabs(zj)>a 




set abs(zj) = a 




modify zi to Zi, given by Zj = a Ll\ 




else 




Zi = zi 




endif 




P 




Evaluate LPCC using z\ by c(n) = I zi n 


P = # of LP poles 


i=l 


n = l...N. N is the cepstral 


A * 


order, normally 12 


Evaluate PFCC using Zi byc(n)=X Zp 




i=l 




• ^ end for 
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compute mean (average) of FFCC vectors to obtain channel estimate filter 
The filter is a representation of a LPC filter in the cepstral domain 
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save the cannel estimate filter as part of the 
voice print in the voice print database. 
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